{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# _Please make sure the previous Endpoints are deleted before proceeding.  Otherwise, you will see a ResourceLimitExceeded error._\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Deploy Multi-Model Endpoint Using Two Trained TensorFlow Models\n",
    "\n",
    "https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/deploying_tensorflow_serving.rst#deploying-more-than-one-model-to-your-endpoint"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you have a large number of similar models that you can serve through a shared serving container - and don’t need to access all the models at the same time - you can use SageMaker’s multi-model endpoint (MME) capability. When there is a long tail of ML models that are infrequently accessed, using one multi-model endpoint can efficiently serve inference traffic and enable significant cost savings. \n",
    "\n",
    "Multi-model endpoints can automatically load and unload models based on traffic and resource utilization. For example, if traffic to model1 goes to zero and model2 traffic spikes, SageMaker will dynamically unload model1 and load another instance of model2. \n",
    "\n",
    "While MME lets you deploy multiple models to a single endpoint and serve them using a single container, you can invoke a specific model by specifying the target model name as a parameter in your prediction request."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"img/multi_model.png\" width=\"80%\" align=\"left\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -q --upgrade pip\n",
    "!pip install -q wrapt --upgrade --ignore-installed\n",
    "!pip install -q tensorflow==2.1.0\n",
    "!pip install -q transformers==2.8.0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import boto3\n",
    "import sagemaker\n",
    "import pandas as pd\n",
    "\n",
    "sess   = sagemaker.Session()\n",
    "bucket = sess.default_bucket()\n",
    "role = sagemaker.get_execution_role()\n",
    "region = boto3.Session().region_name\n",
    "\n",
    "sm = boto3.Session().client(service_name='sagemaker', region_name=region)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Configure First Training Job"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store -r training_job_name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(training_job_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model1_s3_uri = 's3://{}/{}/output/model.tar.gz'.format(bucket, training_job_name)\n",
    "print(model1_s3_uri)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Configure Second Training Job\n",
    "For now, let's re-use the same model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store -r training_job_name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(training_job_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model2_s3_uri = 's3://{}/{}/output/model.tar.gz'.format(bucket, training_job_name)\n",
    "print(model2_s3_uri)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#  Create a `multi/` Directory Structure\n",
    "Put them in ./multi/model1 and model2/ directory structure shown here\n",
    "\n",
    "└── multi\n",
    "\n",
    "    ├── model1\n",
    "  \n",
    "    │   └── <version number>\n",
    "    \n",
    "    │       ├── saved_model.pb\n",
    "  \n",
    "    │       └── variables\n",
    "  \n",
    "    │           └── ...\n",
    "  \n",
    "    └── model2\n",
    "  \n",
    "        └── <version number>\n",
    "      \n",
    "            ├── saved_model.pb\n",
    "          \n",
    "            └── variables\n",
    "          \n",
    "                └── ...\n",
    "              "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!rm -rf ./multi/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir -p ./multi/model1\n",
    "!mkdir -p ./multi/model2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!aws s3 cp $model1_s3_uri model1.tar.gz\n",
    "!aws s3 cp $model2_s3_uri model2.tar.gz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "!tar xvf model1.tar.gz -C ./multi/model1\n",
    "!tar xvf model2.tar.gz -C ./multi/model2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!ls -al ./multi/model1/tensorflow/saved_model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!mv ./multi/model1/tensorflow/saved_model/* multi/model1/\n",
    "!mv ./multi/model2/tensorflow/saved_model/* multi/model2/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!rm -rf multi/model1/tensorflow/\n",
    "!rm -rf multi/model1/transformers/\n",
    "!rm -rf multi/model2/tensorflow/\n",
    "!rm -rf multi/model2/transformers/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!ls -alR ./multi/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Package Both Models into a Single multi.tar.gz\n",
    "Note:  This may take a minute.  The models are large."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!tar -czvf multi.tar.gz multi/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Upload the New Archive to S3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "multi_model_s3_uri = 's3://{}/{}/multi'.format(bucket, training_job_name)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!aws s3 cp multi.tar.gz $multi_model_s3_uri/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!aws s3 ls --recursive $multi_model_s3_uri/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create SageMaker Model from the Multi-Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.tensorflow.serving import Model, Predictor\n",
    "\n",
    "# For multi-model endpoints, you should set the default \n",
    "#    model name in this environment variable. \n",
    "# If it isn't set, the endpoint will work, but the model\n",
    "#    it will select as default is unpredictable.\n",
    "env = {\n",
    "  'SAGEMAKER_TFS_DEFAULT_MODEL_NAME': 'model1'  # <== This must match the directory\n",
    "}\n",
    "\n",
    "model_data = '{}/multi.tar.gz'.format(multi_model_s3_uri)\n",
    "model = Model(model_data=model_data, \n",
    "              role=role, \n",
    "              framework_version='2.1.0', \n",
    "              env=env)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Deploy the Multi-Model as a SageMaker Endpoint\n",
    "The predictor returned by `model.deploy()` is only for the default model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "deployed_model = model.deploy(instance_type='ml.m5.large',\n",
    "                              initial_instance_count=1,\n",
    "                              wait=False)\n",
    "\n",
    "multi_model_endpoint_name = deployed_model.endpoint\n",
    "\n",
    "print('Endpoint name:  {}'.format(multi_model_endpoint_name))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.core.display import display, HTML\n",
    "\n",
    "display(HTML('<b>Review <a href=\"https://console.aws.amazon.com/sagemaker/home?region={}#/endpoints/{}\">REST Endpoint</a></b>'.format(region, multi_model_endpoint_name)))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "client = boto3.client('sagemaker')\n",
    "waiter = client.get_waiter('endpoint_in_service')\n",
    "waiter.wait(EndpointName=multi_model_endpoint_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Simulate a Prediction from an Application"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class RequestHandler(object):\n",
    "    import json\n",
    "    \n",
    "    def __init__(self, tokenizer, max_seq_length):\n",
    "        self.tokenizer = tokenizer\n",
    "        self.max_seq_length = max_seq_length\n",
    "\n",
    "    def __call__(self, instances):\n",
    "        transformed_instances = []\n",
    "\n",
    "        for instance in instances:\n",
    "            encode_plus_tokens = tokenizer.encode_plus(instance,\n",
    "                                                       pad_to_max_length=True,\n",
    "                                                       max_length=self.max_seq_length)\n",
    "\n",
    "            input_ids = encode_plus_tokens['input_ids']\n",
    "            input_mask = encode_plus_tokens['attention_mask']\n",
    "            segment_ids = [0] * self.max_seq_length\n",
    "\n",
    "            transformed_instance = {\"input_ids\": input_ids, \n",
    "                                    \"input_mask\": input_mask, \n",
    "                                    \"segment_ids\": segment_ids}\n",
    "\n",
    "            transformed_instances.append(transformed_instance)\n",
    "\n",
    "        transformed_data = {\"instances\": transformed_instances}\n",
    "\n",
    "        return json.dumps(transformed_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class ResponseHandler(object):\n",
    "    import json\n",
    "    import tensorflow as tf\n",
    "    \n",
    "    def __init__(self, classes):\n",
    "        self.classes = classes\n",
    "    \n",
    "    def __call__(self, response, accept_header):\n",
    "        import tensorflow as tf\n",
    "\n",
    "        response_body = response.read().decode('utf-8')\n",
    "\n",
    "        response_json = json.loads(response_body)\n",
    "\n",
    "        log_probabilities = response_json[\"predictions\"]\n",
    "\n",
    "        predicted_classes = []\n",
    "\n",
    "        # Convert log_probabilities => softmax (all probabilities add up to 1) => argmax (final prediction)\n",
    "        for log_probability in log_probabilities:\n",
    "            softmax = tf.nn.softmax(log_probability)    \n",
    "            predicted_class_idx = tf.argmax(softmax, axis=-1, output_type=tf.int32)\n",
    "            predicted_class = self.classes[predicted_class_idx]\n",
    "            predicted_classes.append(predicted_class)\n",
    "\n",
    "        return predicted_classes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "from sagemaker.tensorflow.serving import Predictor\n",
    "from transformers import DistilBertTokenizer\n",
    "\n",
    "tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')\n",
    "\n",
    "request_handler = RequestHandler(tokenizer=tokenizer,\n",
    "                                 max_seq_length=128)\n",
    "\n",
    "response_handler = ResponseHandler(classes=[1, 2, 3, 4, 5])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor_model1 = Predictor(endpoint_name=multi_model_endpoint_name,\n",
    "                             sagemaker_session=sess,\n",
    "                             serializer=request_handler,\n",
    "                             deserializer=response_handler,\n",
    "                             content_type='application/json',\n",
    "                             model_name='model1',\n",
    "                             model_version=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor_model2 = Predictor(endpoint_name=multi_model_endpoint_name,\n",
    "                             sagemaker_session=sess,\n",
    "                             serializer=request_handler,\n",
    "                             deserializer=response_handler,\n",
    "                             content_type='application/json',\n",
    "                             model_name='model2',\n",
    "                             model_version=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "import json\n",
    "    \n",
    "reviews = [\n",
    "            \"This is great!\", \n",
    "            \"This is not good.\"\n",
    "          ]\n",
    "\n",
    "predicted_classes_model1 = predictor_model1.predict(reviews)\n",
    "\n",
    "for predicted_class, review in zip(predicted_classes_model1, reviews):\n",
    "    print('[Predicted Star Rating: {}]'.format(predicted_class), review)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "import json\n",
    "    \n",
    "reviews = [\n",
    "            \"This is great!\", \n",
    "            \"This is not good.\"\n",
    "          ]\n",
    "\n",
    "predicted_classes_model2 = predictor_model2.predict(reviews)\n",
    "\n",
    "for predicted_class, review in zip(predicted_classes_model2, reviews):\n",
    "    print('[Predicted Star Rating: {}]'.format(predicted_class), review)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Delete Endpoint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "client = boto3.client('sagemaker')\n",
    "\n",
    "client.delete_endpoint(\n",
    "    EndpointName=multi_model_endpoint_name\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%javascript\n",
    "\n",
    "Jupyter.notebook.session.delete();"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_python3",
   "language": "python",
   "name": "conda_python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
